July 1st, 2025

# Introduction

Replication Data & Political Influences Scores from:

Francis, David and Kubinec, Robert. "Beyond Political Connections: A Measurement Model Approach to Estimating Firm-level Political Influence in 41 Countries". *Political Science Research and Methods.* Forthcoming.

This repository contains the code and information about how to get the data to replicate the results in our paper, and up to date political influence scores for the latest World Bank surveys and versions of `idealstan`. It should be noted that this repository only contains the political influence scores themselves along with secondary data files; the primary firm survey data must be downloaded from the World Bank Enterprise portal. Specifically, the main file with firm survey data + political influence scores can be downloaded from the Enterprise Surveys data portal: <https://login.enterprisesurveys.org/content/sites/financeandprivatesector/en/signin.html> (if you do not have an account, you will need to register first.)
 
Once logged in to the portal, click on the tab `COMBINED DATA`, and then click on the link for the file: `BeyondPoliticalConnections-data.rar` and then proceed to download. To open the compressed file on Mac OS X, use the Unarchiver app (<https://apps.apple.com/us/app/the-unarchiver/id425424353?mt=12>). Inside the compressed folder are two files:

1. `Political Influence_public.dta`: contains the median, 95th percentile, 5th percentile, standard error, and variance of individual index scores. The variable `idstd` is the World Bank unique identifier for companies.
 
Note: users wishing to merge the scores with any external Enterprise Survey data set including `idstd` should first decode (i.e., swap values and labels) the `idstd` variable in `Political Influence_public.dta`.
 
2. `data_for_analysis`: This file contains all the firm survey data mentioned in the paper along with the firm influence scores merged in (as posterior median summaries plus quantiles). `data_for_analysis.dta` can be put into the folders described below to complete replication of the index/paper results. To re-create results, the file should be placed in the `In/Processed data` folder.

`combined_data.dta`: This file contains some additional variables that were used in creation of the index. To re-estimate the index, it should be placed in the `In/` folder.

We will keep these files up to date as we estimate new versions of the scores for future World Bank surveys. Up-to-date political influence scores with the `idstd` variable to merge with World Bank Enterprise Survey data are also available from this repo.

If your only intention in accessing this repository is to download the influence scores, we recommend using the link above as the rest of this repository is dedicated to reproducing our original index & results.

## Replication of Original Results

Fully replicating the results in the paper requires estimating the political influence scores with R/`idealstan` and then performing inference with Stata. As such, the repository has both R and Stata code. This file describes how to fully replicate the results. However, there are certain caveats:

1. The individual country firm surveys, along with the aggregated survey data used for this study, files are the property of the World Bank and can only be accessed through the World Bank Enterprise Survey website: <https://login.enterprisesurveys.org/content/sites/financeandprivatesector/en/signin.html>. For our analysis, as we describe at the beginning of this article, all that is necessary to reproduce the results is to download the `data_for_analysis.dta` file and put it in the `In/Processed data` folder in this repository. We list the raw files that are used at the bottom of this README for those who are interested, which must be individually downloaded to reproduce the results from the original files.

2. While we cannot directly share the raw files, the political influence scores along with Enterprise Survey id variables are included in this repository, which can be merged with firm surveys downloaded from the link above. These estimated indexes are in the In folder in RDS form (`scores_sum.rds`) and in CSV form (`scores_sum.csv`). The indexes are also available with individual posterior draws as `scores_raw.rds` in the same folder. These files have the `idstd` column to allow them to be merged with Enterprise Survey data. Note that we keep these scores updated to the most recent version; if you would like the version that came with the article, please see instructions below to re-estimate them.

3. While we set seeds, re-estimating indices with `idealstan` induces additional Monte Carlo error from the stochastic process used to explore posterior geometry. This form of error is quite small but remains when re-estimating the index. Additional error can be induced if the imputation procedure is re-run; again, while this error is quite small, it can change estimated coefficients by a small degree. The file `data_for_analysis.dta` in the In folder contains the scores used for the paper while re-running the R scripts will use a replicated index which will differ slightly.

Given these caveats, please follow the instructions below to reproduce the analysis (assuming that the data files are available):

# Re-estimating the index

This repository contains an `renv` folder with packages used to estimate the political influence scores with `idealstan`. A specific version of `idealstan` must be used along with a specific version of `cmdstan` and `cmdstanr` (the Monte Carlo sampler). Of course, the index can be estimated with the current version of `idealstan`, but the latent scores will be different due to changes in the `idealstan` parameterizaiton (though the sign of parameters should be the same).

Upon opening the folder and the R project file (`wbsurvey.Rproj`), `renv` should run a script in the console, but if it does not, run the script `renv/activate.R`. This file will tell which packages need to be installed (if any) to match what was used to create the index. If `renv` says there are issues, please run `renv::restore()` to re-install any necessary packages, then `renv::install()` and `renv::snapshot()`.

Once the packages are installed, you must manually install a specific version of `cmdstan` via `cmdstanr` with the following command in the R console:

```
cmdstanr::install_cmdstan(version="2.32.1")
```

Note that you must have the necessary code tools (such as Xcode developer tools on Mac OS X) to run `cmdstan`. For more information, see <https://mc-stan.org/cmdstanr/articles/cmdstanr.html>.

Once you have completed this task, you can run the R scripts in the root folder in the following order:

1. `estimate_index.R`
2. `gen_predict_score.R`

Note that you must set a variable `save_loc` in each script to the correct folder containing the repo files for the code to run. Fitting the index will take approximately an hour or two depending on the machine it is run on. These scripts will save the index files mentioned above in the In/ folder and also update the `data_for_analysis.dta` file with the new index values. Again, as mentioned above, the replicated indexes are likely to differ slightly from the original firm scores. 

The `idealstan` model object will be saved in the In folder as `wb_fit_replication.rds` for further inspection if need be.

# Stata Replication

This section describes the Stata-based output for the article "Beyond Political Connections: A Measurement Model Approach
to Estimating Firm-level Political Influence in 41 Countries" that is produced in Stata. (Francis Kubinec 2025)

## The construction of data 

******************************************************
*** All do files can be found in the subfolder `\Stata_do files`
******************************************************

The following two files must be customized by users. `_MyPaths_[CUSTOM].do` can be run from `_Main.do` ***

1.	`_Main.do` (All subsequent files are run through _Main.do. Users must customize _Main.do)
2,	`_MyPaths_[CUSTOM].do` (sets all directory and filepaths. Must be customized by users)

**The below do files do not need to be customized by users and directly replicate analysis in the manuscript. We note that there are a few discrepancies in coefficient values from the manuscript caused by changes in Stata estimation routines (as far as we are able to tell), though these are minor.**

1. Figure 1 produced by:
```
	run "${dofiles}\\Figure1_Item Discrimination Parameters.do"
```

2. Figure 2 produced by:
```
	run "${dofiles}\\Figure2_Distribution of Predicted Firm-level Political Influence by Size.do"
```

3. Figure 3 produced by:
```
	run "${dofiles}\\Figure3_Distribution of Political Influence Index.do"
```

4. Table 3 produced by:
```
	run "${dofiles}\\Table3_Summary Statistics across All Firms.do"
```

5. Figure 4 produced by:
```
	run "${dofiles}\\Figure4_Distributions of Political Influence Scores by Country.do"
```

6. Table 4
```
	run "${dofiles}\\Table4_Mean Summary Statistics Across All countries.do"
```

7. Figure 5
```
	run "${dofiles}\\Figure5_olitical Influence Distributions at High and Low Levels of Voice and Accountability.do"
```

8. Figure 6
```
	run "${dofiles}\\Figure6_BetaOLS and BetaRIF at Quantiles of the Political Influence Score.do"
```

9. Table 5
```
	run "${dofiles}\\Table5_RIF-OLS, Political Influence Relative to the Country Median.do"
```

## Stata Dataset Construction

This file documents the process for producing the file `data_for_analysis.dta`. The production of that file is contained in `Stata_do files/Additional data construction/0_prelim_data setup.do`. 

`0_prelim_data setup.do` is provided in the replication package. Other accompanying files: 0_main_org.do 1_MNA_vars.do 2_MNA_org, and _combined_full_data.do are necessary but are not publicly available. 

Individual economy datasets are available via: <https://login.enterprisesurveys.org/content/sites/financeandprivatesector/en/signin.html> and can be merged with our political influence firm scores via the idstd variable (`scores_sum.csv` in `In/Processed data`). Note that you must create an account with the Enterprise Survey website to download the firm survey files.

These datasets are occasionally updated (though with minimal changes). All changes to publicly available datasets is documented via the www.enterprisesurveys.org website.

In addition, the below list includes the dates of download for each economy/country dataset used in the manuscript: 

File Name                                              | Date
--------------------------------------------------------|----------
Albania-2019-full data.dta                             | 04/23/2020
Armenia-2020-full data.dta                             | 09/17/2020
Azerbaijan-2019-full data.dta                          | 08/19/2020
Belarus-2018-full data.dta                             | 04/23/2020
Bosnia and Herzegovina-2019-full data.dta              | 04/23/2020
Bulgaria-2019-full data.dta                            | 05/13/2020
Croatia-2019-full data.dta                             | 04/23/2020
Czech Republic-2019-full data.dta                      | 04/23/2020
Egypt-2020-full data.dta                               | 08/19/2020
Estonia-2019-full data.dta                             | 04/23/2020
Georgia-2019-full data.dta                             | 04/23/2020
Greece-2018-full data.dta                              | 04/23/2020
Hungary-2019-full data.dta                             | 06/18/2020
Italy-2019-full data.dta                               | 04/23/2020
Jordan-2019-full data.dta                              | 04/16/2020
Kazakhstan-2019-full data.dta                          | 04/23/2020
Kosovo-2019-full data.dta                              | 04/23/2020
Kyrgyz Republic-2019-full data.dta                     | 04/23/2020
Latvia-2019-full data.dta                              | 04/23/2020
Lebanon-2019-full data.dta                             | 05/13/2020
Lithuania-2019-full data.dta                           | 04/23/2020
Malta-2019-full data.dta                               | 04/23/2020
Moldova-2019-full data.dta                             | 04/23/2020
Mongolia-2019-full data.dta                            | 04/23/2020
Montenegro-2019-full data.dta                          | 04/23/2020
Morocco-2019-full data.dta                             | 04/16/2020
North Macedonia-2019-full data.dta                     | 04/23/2020
Poland-2019-full data.dta                              | 04/23/2020
Portugal-2019-full data.dta                            | 04/23/2020
Republic of Cyprus-2019-full data.dta                  | 04/23/2020
Romania-2019-full data.dta                             | 08/19/2020
Russia-2019-full data.dta                              | 04/23/2020
Serbia-2019-full data.dta                              | 04/23/2020
Slovakia-2019-full data.dta                            | 05/13/2020
Slovenia-2019-full data.dta                            | 04/23/2020
Tajikistan-2019-full data.dta                          | 04/23/2020
Tunisia-2020-full data.dta                             | 10/15/2020
Turkey-2019-full data.dta                              | 04/23/2020
Ukraine-2019-full data.dta                             | 04/23/2020
Uzbekistan-2019-full data.dta                          | 04/23/2020
West Bank and Gaza-2019-full data.dta                  | 04/16/2020 